Case Study on LLVM as suitable intermediate language for binary analysis
نویسنده
چکیده
Many binary analysis tools and compilers, instead of directly working on code, use an intermediate representation of it. The idea of this thesis is to use the well-tested intermediate representation from LLVM for binary analysis tasks. We take a look at McSema, a tool to translate x86 and x86_64 binaries to LLVM, describe its translation process in detail and additionally implement Python bindings for it. To practically test McSema, we present five examples of code we translate to LLVM and then recompile again. The last of these demos is an example on using KLEE, a symbolic execution engine for LLVM, on the code produced by McSema in order to successfully solve a CrackMe. We conclude that McSema’s translation approach provides a suitable way to extract functions from binaries to integrate them in other code or to analyse them using symbolic execution, as well as serving as a potential basis to implement an LLVM-based decompiler. We also compare it to Remill, another tool similar to McSema, which generates code that represents the assembly code more explicitly and VEX, the intermediate representation used in Valgind and Angr, which is also more close to the machine code.
منابع مشابه
Enabling sophisticated analyses of ×86 binaries with RevGen
Current state-of-the-art static analysis tools for binary software operate on ad-hoc intermediate representations (IR) of the machine code. Therefore, even though IRs facilitate program analysis by abstracting away the source language, it is hard to reuse existing implementations of analysis tools in new endeavors. Recently, a new compiler framework— LLVM— has emerged, together with many analys...
متن کاملLLVM Optimizations for PGAS Programs Case study: LLVMWide Pointer Optimizations in Chapel
PGAS programming languages such as Chapel, Coarray Fortran, Habanero-C, UPC and X10 [3–6, 8] support high-level and highly productive programming models for large-scale parallelism. Unlike messagepassing models such as MPI, which introduce nontrivial complexity due to message passing semantics, PGAS languages simplify distributed parallel programming by introducing higher level parallel languag...
متن کاملDecompilation to Compiler High IR in a binary rewriter
A binary rewriter is a piece of software that accepts a binary executable program as input, and produces an improved executable as output. This paper describes the first technique in literature to decompile the input binary into an existing compiler’s high-level intermediate form (IR). The compiler’s back-end is then used to generate the output binary from the IR. Doing so enables the use of th...
متن کاملAutomatic Generation of Assembly to IR Translators Using Compilers
Translating low-level machine instructions into higher-level intermediate representation (IR) is one of the central steps in many binary translation, analysis and instrumentation systems. Most of these systems manually build the machine instruction to IR mapping table needed for such a translation. As a result, these systems often suffer from two problems: (a) a great deal of manual effort is r...
متن کاملMetamorphic Code from LLVM IR Bytecode
Metamorphic software changes its internal structure across generations with its functionality remaining unchanged. Metamorphism has been employed by malware writers as a means of evading signature detection and other advanced detection strategies. However, code morphing also has potential security benefits, since it can serve to increase the “genetic diversity” of software. We have created a me...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017